Genomics Data Curation Roles, Skills, and Perception of Data Quality
نویسندگان
چکیده
Compared to a decade ago, genomics scientists, driven by technical changes and availability of massive genomic data, are performing a wider plurality of curation roles including those of end-users, curators, or dual-role users. Scientists with different curation roles (including that of end user) may focus on different data quality aspects and skills requirements in a community curation environment. This study examines how genomics scientists' perceived priorities for data quality and data quality skills differ when assuming different roles played in genomics data curation work. The analysis of survey data collected from 147 genomics scientists found that curators of genomic data valued quality criteria that can be assessed through direct examination of the data more highly, while end-users placed a high value on the quality criteria that can be assessed indirectly such as believability. With regard to data quality skills, curators appeared to care more about understanding user's requirements and specific data management skills than end-users, while end-users valued the skills needed to deal with information overload more highly – those needed to identify useful, relevant information from large amounts of data. The study found that scientists with different curation roles, given common curation tasks with the same skill requirements, prioritized different data quality criteria. The data quality, skill priorities, and tradeoffs identified by this study can inform the development of effective data curation mandates and policies, data quality assurance planning and training, and the design of curation role specific tool dashboards and visualization interfaces for genomics data.
منابع مشابه
Domain knowledge and data quality perceptions in genome curation work
Purpose-This article aims at understanding genomics scientists' perceptions in data quality assurances based on their domain knowledge. Design/methodology/approach-The study used a survey method to collect responses from 149 genomics scientists grouped by domain knowledge. They ranked the top-five quality criteria based on hypothetical curation scenarios. The results were compared using Chi-Squ...
متن کاملPrioritization of data quality dimensions and skills requirements in genome annotation work
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scientific practice, poses new challenges to management of the quality of scientific data. This study contributes towards better understanding of scientist perception and priorities for data quality and data quality assurance skills needed in genome annotation. Our study was guided by a previously de...
متن کاملBig Data to Knowledge—Harnessing Semiotic Relationships of Data Quality and Skills in Genome Curation Work
This article aims to understand the views of genomics scientists with regard to the data quality assurances associated with semiotics and Data-Information-Knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic DQ dimension...
متن کاملStudy of the foundation, models and issues of research data curation and management in scientific and academic environments
Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...
متن کاملFrom manual curation to visualization of gene families and networks across Solanaceae plant species
High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014